
STOP 



Early Journal Content on JSTOR, Free to Anyone in the World 

This article is one of nearly 500,000 scholarly works digitized and made freely available to everyone in 
the world by JSTOR. 

Known as the Early Journal Content, this set of works include research articles, news, letters, and other 
writings published in more than 200 of the oldest leading academic journals. The works date from the 
mid-seventeenth to the early twentieth centuries. 

We encourage people to read and share the Early Journal Content openly and to tell others that this 
resource exists. People may post this content online or redistribute in any way for non-commercial 
purposes. 

Read more about Early Journal Content at http://about.jstor.org/participate-jstor/individuals/early- 
journal-content . 



JSTOR is a digital library of academic journals, books, and primary source objects. JSTOR helps people 
discover, use, and build upon a wide range of content through a powerful research and teaching 
platform, and preserves this content for future generations. JSTOR is part of ITHAKA, a not-for-profit 
organization that also includes Ithaka S+R and Portico. For more information about JSTOR, please 
contact support@jstor.org. 



THE ANALYST. 



Vol. V. January, 1878. No. 1. 



ON THE GROUPING OF SIGNS OF RESIDUALS. 



BY E. L. DE FOREST, A. M., WATERTOWN, CONN. 

When a series of equidistant terms, affected by errors of observation, is 
adjusted by means of such a formula as 

W' = Jo W 0+^( M l+ W -l)+^( w 2+ W -2)+ +C(m»+«-»), (1) 

the extent to which the adjusting process should be carried can often be 
determined by observing the fortuitous distribution of the signs of the 
residuals v which result from subtracting each adjusted term from the cor- 
responding observed term. (See the Analyst for July, 1877, p. Ill, and 
the pamphlet Interpolation and Adjustment of Series, p. 31.) This method 
of testing an adjustment presupposes that the errors of consecutive terms are 
wholly independent of each other, so that the fact that a particular observed 
term is, for instance, in excess of its normal value, shall be no reason for 
presuming that the next term will be also in excess, or the reverse. The 
method would not apply to such a case as the determination of the normal 
curve of annual temperature from a series of daily means covering only a 
single year, because variations from the normal occur in waves of heat or 
cold, extending over periods of several days each. But it does apply to 
many kinds of physical and statistical series, as for instance to the experi- 
ence table of mortality discussed in the pamphlet referred to. The normal or 
probable distribution of signs was there ascertained, on the assumption that 
the adjusted series is the true one, and it was inferred that the adjusting 
process ought to be carried far enough to secure such a distribution, within 
certain ascertained limits of probable error. This standard of adjust- 
ment, however, is in some degree an ideal one, which we ought not to 
expect fully to attain even with the best adjustment formulas we can use, 
because the adjusted series can hardly ever be the true one, and the actual 
residuals are not the true errors. The distributon of signs will depend 



—2— 

somewhat upon the ratio existing between the probable errors e' and e of 
the adjusted and the observed terms. The case is analogous to that of the 
other test proposed, where the mean error e x of each observed term is sup- 
posed to be known, and it was shown that, assuming the adjusted series to 

be the true one, the arithmetical mean of all the values of ( — ) for the N 

terms of the series would be 

li.6745^. (2) 

But it was also shown (Analyst, p. 109, Vol. IV) that, when we use an 
adjustment formula whose error-ratio is 

r = L = v iii + 2(q+q+ .... +?*)], 

the mean of ( — ) will most probably be 

(l—*)(l±.6745^), (3) 

which agrees with (2) only in the extreme case, when we suppose the adjust- 
ing process to be perfect, so that s' = and r = 0. The difference between 
(2) and (3) is rather too large to be disregarded. This test cannot be used 
when we have not the data for finding e 15 but we may use equally well, if 
the nature of the errors justifies it, the test afforded by the observed group- 
ing of the signs of the residuals. It is the object of the present paper to 
ascertain, as far as may be, what the probable distribution of these signs 
will be for any given value of r. 

The terms of the adjusted series may be regarded as occupying an inter- 
mediate position among the successive observed terms, similar to that which 
an arithmetical mean occupies with reference to several observed values of 
a single quantity. (Compare the Analyst, p. 108.) Let n x be a number 
such that the ratio which the probable error of the mean of % such values 
bears to that of a single value shall be equal to the error-ratio r, so that we 

have r = . (4) 

In a group of n x successive values, it is most probable that half of the terms 
will be in excess of the mean, and the other half in defect, so that if a par- 
ticular residual is positive, for instance, it is most likely that the number 
of other positive residuals in the group will be %n 1 — 1, while the number 
of negative ones will be ^n 1 . Hence, the probability that another residual 
in the group, that is to say, the residual next to that which we have sup- 
posed positive, will be likewise positive, is 



— 3— 

«!— 1' 

and giving to Tij its value from (4), we find that the probability that any 
two consecutive residuals will have like signs is 

1— 2r 2 ,-> 

If r equals or exceeds \\Z1 = 0.707, then q becomes zero or negative, and 
it appears impossible, in such a case, for any two consecutive residuals to 
have like signs. It is not really impossible, however, as can be seen by 
trial with, for example, the formula 

M 'o = Tsl 10u + 4 K + M -1 )— K + M -2)]> 

where r = 0.723. Formula (5), therefore, does not give a rigorously true 
value of q, but it is found to give a good approximate value, for such ad- 
justment formulas as are likely to be used in practice, at least when r < \. 
It is strictly true when r — 0. 

The probability that any group of n consecutive residuals will have like 
signs is approximately g* -1 , provided that n is a small number, and less 
than % . The probability that such a group of signs will be both alike and 
isolated, that is, different from the signs next preceding and following it, is 

p = tf-Hl—qf, 

and consequently 

(1-2^ (6) 

v (2— 2r 2 )" +1 v 

In a series of N signs in all, the whole number of possible groups of n 
consecutive signs each will be N if the series is periodic, that is, if its first 
and last terms are consecutive. Hence by well known principles, the most 
probable number of isolated groups of n like signs occurring from accidental 
causes, is for a periodic series approximately 

P = pN±.6745i/£p(l— p)N]. (7) 

If we give to n in (6) the values 1 and 2, we have 

1 _ 1— 2r 2 

Pl— 4(1— r 2 ) 2 ' P2 8(1— r 2 ) 3 ' 

and the total number of signs included in groups of only one or two signs 

each is M=(p 1 +2p 2 )N±.6745 l /{[p 1 (l-p 1 )+4p 2 (l-p 2 )W)-- # 

Assigning to p x and p 2 their values as above, and expanding binomials 
and neglecting powers of r 2 in the probable error, we have with sufficient 
accuracy when r<|, 

Ms= 4 2 ( i^! r ^ ± - 533(1+ ^ )i/iv: (8) 



A perfect adjustment, making r = 0, would give 

M=$N±.53Si/N, (9) 

as was found in the Analyst, p. Ill, Vol. IV. Actually, an adjusted se- 
ries will always have some probable error, and the most probable value of 
if as given by (8) will always be somewhat larger than JiV. And since q 
in formula (5) is less than J, making it more probable that any two consec- 
utive signs will be unlike than that they will be alike, it follows that the 
largest group likely to occur will contain a rather smaller number of signs 
than 

1 + 3.32 log N, 

which is the number found in the Analyst referred to. 

Formulas (7) and (8) will also apply to the case of ordinary or non-peri- 
odic series, for purposes of rough estimation, provided we treat the first and 
last residuals as consecutive, so that their signs, if they are alike, belong to 
the same group. A more precise determination is as follows. The first and 
last signs of the series, and as many adjacent signs as happen to be like 
them, cannot properly belong to any isolated group at all, the next signs be- 
yond being unknown. Omitting the first and last sign, the whole number 
of possible isolated groups of n signs each will be N — n — 1, and the num- 
ber of such groups which will most probably occur is 

F = p(N— n— l)±.6745 1 /[p(l— p){N— n— 1)]. (10) 

Proceeding as before, we find for the probable total number of signs inclu- 
ded in groups of only one or two signs each 

Jf = Pl (N-2)+2p 2 (N-S)±:.G745 V [p t (l- Pl )(N-2H4p 2 (l-p 2 ){N-S)-]. 
Assigning to p x and p 2 their values, expanding binomials and neglecting 
powers of r 2 and other negligible quantities, we get with sufficient accuracy 
when r < \, 

M= {2 ~ 8 ^ ?-^ 6) ± - 5BS ( 1 +^l / (N-3). (11) 

The foregoing results have been tested by means of series adjusted with 
several different formulas, and the agreement between computation and ob- 
servation was as close as could fairly be expected. To illustrate, let us take 
from the Smithsonian Report of 1873, p. 335, the signs of residuals which 
occur in an experience table of mortality adjusted by the 21-term formula 
of Table B. (Analyst, May 1877, p. 82.) Here we have r = .372 and 
N= 70. The probable total of signs falling within groups of only one or 
two, as computed by (11), is 

M= 41. 8 ±4.7, while the 

obs'd number as counted in the table, omitting the first and last signs, is 46. 



-5— 



1 1 




2 


+ 


1 3 


— 


4 


+ 


5 


+ 


6 


— 


7 


+ 


8 


— 1 


9 


+ 


10 


— 


11 


+ 


12 


i 


13 


+ i 


14 


! 



15 
16 

17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 



+ 



+ 
+ 
+ 



+ 
+ 

+ 



29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 



+ 
+ 

+ 



+ 

+ 



+ 
+ 



43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 
56 



_ 


57 


+ 


— 


58 


— 


— 


59 


— 


+ 


60 


+ 


+ 


61 


— 


— 


62 


— 


+ 


63 


— 


+ 


64 


+ 


— 


65 


— I 


— 


66 


+ 


+ 


67 


+ 


• — 


68 


— 


+ 


69 


+ 


J 


70 


i 



The agreement is sufficient to justify the conclusion that the adjusting pro- 
cess probably has not introduced any new errors into the series, and that 
the adjustment is good as far as it goes. The same series adjusted by the 
9-term formula of Table B., for which r = .537, gives by computation 

M= 53.1 ±5.0, 
while observation shows M = 53. 

A fictitious series of 100 terms was constructed by the aid of the system 
of errors of equal frequency, in Interpolation etc., p. 24, and was adjusted 
by the 25-term formula of Table C. Here the series was periodic, and r 
= .311, and formula (8) gave 

Jf=58.0±5.6. 
The number actually observed was M = 56. 

A few general remarks on the subject of tests of good adjustment will be 
appropriate here. If we could construct the true series, so as to have r=0, 

then the most probable value of the arithmetical mean of ( — J would be 

unity, as in formula (2), and tha most probable number M of signs falling 
in groups of only one or two signs each, in a periodic series, would be \N, 
as in formula (9). Practically, this state of things cannot be quite realized, 
for it is probable that, even by repeated adjustments, we can never reduce 
the value of r much below somewhere from \ to \. But any adjusted 
series ought to satisfy, approximately at least, the condition that the ob- 
served mean of I — J should be as in formula (3). If it is greater than the 

uppervalue (l — -A {\ + .6745 J4), 



— 6— 

there will be some reason to fear that the series has been smoothed out too 
much, and that large values of v have been caused by departures from the 
normal form of the series. The same will be true if the observed number 
M is smaller than the lower value given by formula ^8) 

and similarly when (11) is used. 

The mere fact that the conditions (3) and (8) are satisfied, however, is not 
of itself a sufficient evidence of good adjustment. For suppose the adjust- 
ment were the least possible, so that the adjusted series did not differ sensi- 
bly from the unadjusted one; then we shall have every where v = and e' 
=e and r=l, so that the mean of (v-^-Si) 2 would be zero and the expression 
(3) would be also zero, and the condition would be fully satisfied though no 
adjustment had been made. The same may be said respecting a condition 
employed by another writer, and remarked upon somewhat imperfectly in 
Interpolation etc., pp. 14 to 16. The condition reduces to this, that when 
a law of mortality or other similar series has been adjusted by means of an 
empirical equation containing k constants or parameters, the mean of (iK-e^ 2 
ought to be approximately 

1— A 

N' 

This is true if the equation is capable of expressing the true law of the 
series, and if the values of the k constants have been determined by the 
method of least squares. When they are found by any other method, the 
mean will be a little larger. Now in a rate of mortality the precise analyt- 
ical form of the series is not known in advance, and the number k is one of 
the things to be determined. Suppose k=N. The given series can be rep- 
resented exactly by an algebraic curve which has N parameters. Here then 
we shall have every where v = 0, so that the mean of (e-r-gj) 2 is zero and 
1 — (k-i-N) is also zero, and this condition of good adjustment is satisfied 
where no adjustment at all has been made. The inference is, that an equa- 
tion which satisfies this condition cannot be considered as the true one, or the 
best one possible, unless it shall also appear that the condition cannot be 
satisfied by any other equation of simpler form or containing fewer constants. 
We conclude then, that if we use an adjustment formula such as (1), a suf- 
ficient test will require that the mean of (v-^) 2 or the value of M should 
be brought as near to their ultimate values 1 and %N respectively, as they 
can be without violating the conditions (3) and (8). In other words, the 
most accurate adjustment will probably be made by that formula whose 
error-ratio r is the least, provided that the conditions (3) and (8) or (11) 
are approximately satisfied. 



